DataFrame From XML

Apache Spark can also be used to process or read simple to complex nested XML files into Spark DataFrame and writing it back to XML, Avro, Parquet, CSV, and JSON file formats, to process XML files we use Databricks Spark XML API (spark-xml) library with Scala language.

https://github.com/databricks/spark-xml 
$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-xml_2.11:0.9.0

val df = spark.read.format("com.databricks.spark.xml").option("inferschema","true").option("rowTag", "property").load("test1.xml")

No comments:

Post a Comment